Search CORE

196 research outputs found

The genetic organisation of prokaryotic two-component system signalling pathways

Author: Robert HN Williams
David E Whitworth
RB Bourret
C Fabret
T Mizuno
JM Skerker
K Yamamoto
MT Laub
DE Whitworth
L Li
M Weigt
L Burger
L Løvdok
PJ Piggot
R Paul
S Jagadeesan
S Wegener-Feldbrügge
PJA Cock
PI Higgs
DE Whitworth
N Majdalani
S Romagnoli
LE Ulrich
M Barakat
MY Galperin
MY Galperin
DE Whitworth
MY Galperin
MY Galperin
DE Whitworth
PJ Cock
DE Whitworth
PJA Cock
A Pallejà
Y Fukuda
PJA Cock
S Schübbe
JL Appleby
P Dam
M Pertea
I Macarthur
KA Walker
W Zhang
S Romagnoli
A Busch
LE Ulrich
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Two-component systems (TCSs) are modular and diverse signalling pathways, involving a stimulus-responsive transfer of phosphoryl groups from transmitter to partner receiver domains. TCS gene and domain organisation are both potentially informative regarding biological function, interaction partnerships and molecular mechanisms. However, there is currently little understanding of the relationships between domain architecture, gene organisation and TCS pathway structure. Results Here we classify the gene and domain organisation of TCS gene loci from 1405 prokaryotic replicons (>40,000 TCS proteins). We find that 200 bp is the most appropriate distance cut-off for defining whether two TCS genes are functionally linked. More than 90% of all TCS gene loci encode just one or two transmitter and/or receiver domains, however numerous other geometries exist, often with large numbers of encoded TCS domains. Such information provides insights into the distribution of TCS domains between genes, and within genes. As expected, the organisation of TCS genes and domains is affected by phylogeny, and plasmid-encoded TCS exhibit differences in organisation from their chromosomally-encoded counterparts. Conclusions We provide here an overview of the genomic and genetic organisation of TCS domains, as a resource for further research. We also propose novel metrics that build upon TCS gene/domain organisation data and allow comparisons between genomic complements of TCSs. In particular, '<it>percentage orphaned TCS genes</it>' (or 'Dissemination') and '<it>percentage of complex loci</it>' (or 'Sophistication') appear to be useful discriminators, and to reflect mechanistic aspects of TCS organisation not captured by existing metrics.</p

Crossref

Aberystwyth Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data

Author: A Martínez-Alcántara
Daniel A Peterson
DR Zerbino
GJ Hannon
J Rougemont
JC Dohm
ML Metzker
Murray P Cox
P Pavlidis
Patrick J Biggs
PC Dolan
PJA Cock
R Development Core Team
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Illumina's second-generation sequencing platform is playing an increasingly prominent role in modern DNA and RNA sequencing efforts. However, rapid, simple, standardized and independent measures of run quality are currently lacking, as are tools to process sequences for use in downstream applications based on read-level quality data. Results We present SolexaQA, a user-friendly software package designed to generate detailed statistics and at-a-glance graphics of sequence data quality both quickly and in an automated fashion. This package contains associated software to trim sequences dynamically using the quality scores of bases within individual reads. Conclusion The SolexaQA package produces standardized outputs within minutes, thus facilitating ready comparison between flow cell lanes and machine runs, as well as providing immediate diagnostic information to guide the manipulation of sequence data for downstream analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

Author: A Martinez-Alcantara
D Blankenberg
ER Mardis
M Margulies
M Morgan
MP Cox
Mukesh Jain
PJA Cock
R Garg
R Garg
R Schmieder
R Schmieder
Ravi K. Patel
RV Pandey
T Lassmann
Z Wang
Zhanjiang Liu
Publication venue: Public Library of Science
Publication date: 01/02/2012
Field of study

Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

interPopula: a Python API to access the HapMap Project dataset

Author: B Peng
B Rhead
D Rios
D Smedley
F Hsu
F Rousset
GA Thorisson
IH Consortium
J Akey
JD Hunter
JE Stajich
JL Kelley
LD Stein
PJA Cock
SA Tishkoff
TE Oliphant
Tiago Antao
V Curwen
VJ Carey
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background The HapMap project is a publicly available catalogue of common genetic variants that occur in humans, currently including several million SNPs across 1115 individuals spanning 11 different populations. This important database does not provide any programmatic access to the dataset, furthermore no standard relational database interface is provided. Results interPopula is a Python API to access the HapMap dataset. interPopula provides integration facilities with both the Python ecology of software (e.g. Biopython and matplotlib) and other relevant human population datasets (e.g. Ensembl gene annotation and UCSC Known Genes). A set of guidelines and code examples to address possible inconsistencies across heterogeneous data sources is also provided. Conclusions interPopula is a straightforward and flexible Python API that facilitates the construction of scripts and applications that require access to the HapMap dataset.</p

LSTM Online Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

WeBIAS: a web server for publishing bioinformatics applications

Author: A Papanicolaou
B Giardine
B Néron
B Wilczyński
Bartek Wilczyński
Bogdan Lesyng
J Ren
P Daniluk
Paweł Daniluk
PJA Cock
T Oinn
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pydna: a simulation and documentation tool for DNA assembly strategies using python

Author: B Johansson
Björn Johansson
C Rossant
D Schlieper
DG Gibson
DG Gibson
E Appleton
E Iizasa
EL Ivanov
F Pérez
Filipa Pereira
Flávio Azevedo
Gabriela F Ribeiro
J Kärkkäinen
Mark W Budde
NJ Hillson
O Poch
PJA Cock
PMR Guimarães
Y Dharmadi
YD Chang
Z Shao
Ângela Carvalho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Recent advances in synthetic biology have provided tools to efficiently construct complex DNA molecules which are an important part of many molecular biology and biotechnology projects. The planning of such constructs has traditionally been done manually using a DNA sequence editor which becomes error-prone as scale and complexity of the construction increase. A human-readable formal description of cloning and assembly strategies, which also allows for automatic computer simulation and verification, would therefore be a valuable tool.Results: We have developed pydna, an extensible, free and open source Python library for simulating basic molecular biology DNA unit operations such as restriction digestion, ligation, PCR, primer design, Gibson assembly and homologous recombination. A cloning strategy expressed as a pydna script provides a description that is complete, unambiguous and stable. Execution of the script automatically yields the sequence of the final molecule(s) and that of any intermediate constructs. Pydna has been designed to be understandable for biologists with limited programming skills by providing interfaces that are semantically similar to the description of molecular biology unit operations found in literature.Conclusions: Pydna simplifies both the planning and sharing of cloning strategies and is especially useful for complex or combinatorial DNA molecule construction. An important difference compared to existing tools with similar goals is the use of Python instead of a specifically constructed language, providing a simulation environment that is more flexible and extensible by the user.Thanks to Dr. Aric Hagberg Los Alamos National Laboratory, U.S.A and Sergio Simoes, Universidade de Sao Paulo, Brasil for help with NetworkX and graph theory in general. Thanks to Henrik Bengtsson, Dept of Epidemiology & Biostatistics, University of California San Francisco, U.S.A. for critical reading of the manuscript. Thanks to the 2013 Bioinformatics 6605 N4 students A. Coelho, A. Faria, A. Neves D. Yelshyna and E. Costa for testing. This work was supported by the Fundacao para a Ciencia e Tecnologia (FCT) [PTDC/AAC-AMB/120940/2010, EXPL/BBB-BIO/1772/2013]; and the FEDER POFC-COMPETE [PEst-C/BIA/UI4050/2011]. FA and GR were supported by FCT fellowships [SFRH/BD/80934/2011 and SFRH/BD/42565/2007, respectively].info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

Springer - Publisher Connector

PubMed Central

Caltech Authors

Microbiome profiling by Illumina sequencing of combinatorial sequence-tagged PCR products

Author: A Chao
A Chao
A Mortazavi
A Nocker
AF Andersson
Andrew D. Fernandes
B Efron
C Camacho
C Quince
CL Lauber
DA Benson
DJG Lahr
DN Frank
DR Bentley
DR Smith
EP Smith
Frank R. DeLeo
Gregor Reid
Gregory B. Gloor
J Oksanen
J Pawlowski
J Ravel
J Reeder
J Schellenberg
Jean M. Macklaim
JF Petrosino
JG Caporaso
JR Cole
LA Amaral-Zettler
M Hamady
N Whiteford
PJA Cock
PN Polymenakou
R Colwell
R Hummelen
Roderick MacPhee
Ruben Hummelen
Russell J. Dickson
S Hurlbert
S Rodrigue
S Srinivasan
SF Altschul
SF Altschul
SM Huse
V Laurikari
WR Engels
Y Shi
Y Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/07/2010
Field of study

We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads al- lowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an ob- servation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bac- terial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Scholarship@Western

Crossref

Directory of Open Access Journals

PubMed Central

Erasmus University Digital Repository

e-MIR2: a public online inventory of medical informatics resources

Author: A Hotho
C Burchill
Casimir Kulikowski
D De la Iglesia
E Loper
F Lewitter
G De la Calle
Guillermo de la Calle
I Peters
ID Dinov
JD Tenenbaum
JY Bansard
M García-Remesal
M García-Remesal
M Lutz
MD Brazas
MF Porter
Miguel García-Remesal
MJ Schuemie
N Cannata
Nelida Nkumu-Mbomio
NM Lorenzi
PJA Cock
RBB Fitzpatrick
Science Staff: Dealing with data
Victor Maojo
YB Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background. Over the last years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI), comparable inventories are as yet not available for Medical Informatics (MI) field, so that locating and accessing them currently remains a hard and time-consuming task. Description. We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources? names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different taxonomies by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the taxonomies. The classification algorithm identifies the categories associated to resources and annotates them accordingly. The database is then populated with this data after manual curation and validation. Conclusions. We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contained 282 resources at the time of writing. We are continuing to expand the number of available resources by taking into account further publications as well as suggestions from users and resource developers

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivo Digital UPM

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Author: A Djikeng
AH Wright
AR Quinlan
C Quince
D Gusfield
DB Jaffe
EA Dinsdale
Forest Rohwer
G Myers
G Navarro
GR Reyes
J Falgueras
JC Dohm
JR Cole
M Margulies
P Froussard
P Schloss
PD Schloss
PJA Cock
RA Baeza-Yates
Robert Edwards
Robert Schmieder
RV Thurber
S Diguistini
S Huse
S Nakamura
SG Tringe
V Kunin
Y Chen
Yan Wei Lim
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at <url>http://edwards.sdsu.edu/tagcleaner</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MEvoLib v1.0: the first molecular evolution library for Python

Author: A Löytynoja
A Stamatakis
DA Bader
DA Benson
Eduardo Ruiz-Pesini
F Sievers
Jorge Álvarez-Jarreta
K Katoh
MN Price
MS Swenson
PJA Cock
RC Edgar
S Guindon
S Nelesen
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref